Lab 1 - Operating System Perspective

Task: System Calls

Enter the chapters/software-stack/system-calls/drills/tasks/basic-syscall/ folder. Run make and then enter chapters/software-stack/system-calls/drills/tasks/basic-syscall/support/ folder and go through the practice items below.

For debugging, use strace to trace the system calls from your program and make sure the arguments are set right.

Update the hello.asm and / or hello.s files to print both Hello, world! and Bye, world!. This means adding another write() system call.

Update the hello.asm and / or hello.s files to sleep before the exit system call.
You need to make the sys_nanosleep() system call, with the timespec structure. Find its ID here.
Update the hello.asm and / or hello.s files to read a message from standard input and print it to standard output.
You'll need to define a buffer in the data or bss section. Use the read system call to read data in the buffer. The return value of read (placed in the rax register) is the number of bytes read. Use that value as the 3rd argument or write, i.e. the number of bytes printed.
Find the ID of the read system call here. To find out more about its arguments, see its man page. Standard input descriptor is 0.
Difficult: Port the initial program to ARM on 64 bits (also called aarch64).
Use the skeleton files in the arm/ folder. Find information about the aarch64 system calls here.
Create your own program, written in assembly, doing some system calls you want to learn more about. Some system calls you could try: open(), rename(), mkdir(). Create a Makefile for that program. Run the resulting program with strace to see the actual system calls being made (and their arguments).

If you're having difficulties solving this exercise, go through this reading material.

Task: System Call Wrappers

Enter the chapters/software-stack/system-calls/syscall-wrapper/drills/tasks/support/ folder and go through the practice items below.

Update the files in the support/ folder to make read system call available as a wrapper. Make a call to the read system call to read data from standard input in a buffer. Then call write() to print data from that buffer.
Note that the read system call returns the number of bytes read. Use that as the argument to the subsequent write call that prints read data.
We can see that it's easier to have wrapper calls and write most of the code in C than in assembly language.

Update the files in the support/ folder to make the getpid system call available as a wrapper. Create a function with the signature unsigned int itoa(int n, char *a) that converts an integer to a string. It returns the number of digits in the string. For example, it will convert the number 1234 to the string "1234" string (NULL-terminated, 5 bytes long); the return value is 4 (the number of digits of the "1234" string).
Then make the call to getpid; it gets no arguments and returns an integer (the PID - *process ID- of the current process).

If you're having difficulties solving this exercise, go through this reading material.

Task: Library Calls vs System Calls

Enter the chapters/software-stack/system-calls/drills/tasks/libcall-syscall/support/ folder and go through the practice items below.

Check library calls and system calls for the call2.c file. Use ltrace and strace.
Find explanations for the calls being made and the library call to system call mapping.

If you're having difficulties solving this exercise, go through this reading material.

Modern Software Stacks

Most modern computing systems use a software stack such as the one in the figure below:

Modern Software Stack

This modern software stack allows fast development and provides a rich set of applications to the user.

The basic software component is the operating system*- (OS) (technically the operating system kernel). The OS provides the fundamental primitives to interact with hardware (read and write data) and to manage the running of applications (such as memory allocation, thread creation, scheduling). These primitives form the system call API*- or system API. An item in the system call API, i.e. the equivalent of a function call that triggers the execution of a functionality in the operating system, is a system call.

The system call API is well-defined, stable and complete: it exposes the entire functionality of the operating system and hardware. However, it is also minimalistic with respect to features, and it provides a low-level (close to hardware) specification, making it cumbersome to use and not portable.

Due to the downsides of the system call API, a basic library, the standard C library*- (also called libc), is built on top of it. Because the system call API uses an OS-specific calling convention, the standard C library typically wraps each system call into an equivalent function call, following a portable calling convention. More than these wrappers, the standard C library provides its own API that is typically portable. Part of the API exposed by the standard C library is the standard C API, also called ANSI C*- or ISO C; this API is typically portable across all platforms (operating systems and hardware). This API, going beyond system call wrappers, has several advantages:

portability: irrespective of the underlying operating system (and system call API), the API is the same
extensive features: string management, I/O formatting
possibility of increased efficiency with techniques such as buffering, as we show later

Analyzing the Software Stack

To get a better grasp on how the software stack works, let's do a bottom-up approach: we build and run different programs, that start off by using the system call API (the lowest layer in the software stack) and progressively use higher layers.

System Calls Explained

A system call, or syscall for short, is a method used by applications to communicate with the operating system's kernel.

The need for syscalls is tied to the modern operating systems model of conceptually separating into kernel space and user space.

The kernel space manages the hardware resources such as CPU, I/O devices, disk or memory. Moreover, the kernel also provides an interface for the user space applications to interact with the hardware.

The user space is where you are running your applications and processes. From the user space, we cannot directly access the hardware or perform privileged operations. You need to use syscalls to perform privileged operations such as accessing the hardware.

Below, you can see some examples of system calls and what resource they request from the kernel:

brk() is used to allocate memory
open() is used to access the file system and open a specific file
write() is used to access the file system and modify the contents of a specific file

System Call API Explained

Basic System Calls

The basic-syscall/support/ folder stores the implementation of a simple program in assembly language for the x86_64 (64 bit) architecture. The program invokes two system calls: write and exit. The program is duplicated in two files using the two x86 assembly language syntaxes: the Intel / NASM syntax (hello.asm) and the AT&T / GAS syntax (hello.s).

The implementation follows the x86_64 Linux calling convention:

system call ID is passed in the rax register
system call arguments are passed, in order, in the rdi, rsi, rdx, r10, r8, r9 registers

Let's build and run the two programs:

student@os:~/.../basic-syscall/support$ ls
hello.asm  hello.s  Makefile

student@os:~/.../basic-syscall/support$ make
nasm -f elf64 -o hello-nasm.o hello.asm
cc -nostdlib -no-pie -Wl,--entry=main -Wl,--build-id=none  hello-nasm.o   -o hello-nasm
gcc -c -o hello-gas.o hello.s
cc -nostdlib -no-pie -Wl,--entry=main -Wl,--build-id=none  hello-gas.o   -o hello-gas

student@os:~/.../basic-syscall/support$ ls
hello.asm  hello-gas  hello-gas.o  hello-nasm  hello-nasm.o  hello.s  Makefile

student@os:~/.../basic-syscall/support$ ./hello-nasm
Hello, world!
student@os:~/.../basic-syscall/support$ ./hello-gas
Hello, world!

The two programs end up printing the Hello, world! message at standard output by issuing the write system call. Then they complete their work by issuing the exit system call.

The write system call writes a buffer to the file referred by the first argument, which is the file descriptor. File descriptors are going to be studied in-depth in future chapters. For now, it is enough for you to know that they are integers that behave like file handlers. The 3 most common file descriptors are:

0 references the standard input (stdin)
1 references the standard output (stdout)
2 references the standard error (stderr)

Use man 2 write and man 3 exit to get a detailed understanding of the syntax and use of the two system calls. You can also check the online man pages: write, exit

We use strace to inspect system calls issued by a program:

student@os:~/.../basic-syscall/support$ strace ./hello-nasm
execve("./hello-nasm", ["./hello-nasm"], 0x7ffc4e175f00 /- 63 vars */) = 0
write(1, "Hello, world!\n", 14Hello, world!
)         = 14
exit(0)                                 = ?
+++ exited with 0 +++

There are three system calls captured by strace:

execve(): this is issued by the shell to create the new process; you'll find out more about execve in the "Compute" chapter
write(): called by the program to print Hello, world! to standard output
exit(): to exit the program

This is the most basic program for doing system calls. Given that system calls require a specific calling convention, their invocation can only be done in assembly language. Obviously, this is not portable (specific to a given CPU architecture, x86_64 in our case) and too verbose and difficult to maintain. For portability and maintainability, we require a higher level language, such as C. In order to use C, we need function wrappers around system calls.

System Call Wrappers

The syscall-wrapper/support/ folder stores the implementation of a simple program written in C (main.c) that calls the write() and exit() functions. The functions are defined in syscall.asm as wrappers around corresponding system calls. Each function invokes the corresponding system call using the specific system call ID and the arguments provided for the function call.

The implementation of the two wrapper functions in syscall.asm is very simple, as the function arguments are passed in the same registers required by the system call. This is because of the overlap of the first three registers for the x86_64 Linux function calling convention and the x86_64 Linux system call convention.

syscall.h contains the declaration of the two functions and is included in main.c. This way, C programs can be written that make function calls that end up making system calls.

Let's build, run and trace system calls for the program:

student@os:~/.../syscall-wrapper/support$ ls
main.c  Makefile  syscall.h  syscall.s

student@os:~/.../syscall-wrapper/support$ make
gcc -c -o main.o main.c
nasm -f elf64 -o syscall.o syscall.s
cc -nostdlib -no-pie -Wl,--entry=main -Wl,--build-id=none  main.o syscall.o   -o main

student@os:~/.../syscall-wrapper/support$ ls
main  main.c  main.o  Makefile  syscall.h  syscall.o  syscall.s

student@os:~/.../software-stack/lab/syscall-wrapper$ ./main
Hello, world!

student@os:~/.../syscall-wrapper/support$ strace ./main
execve("./main", ["./main"], 0x7ffee60fb590 /- 63 vars */) = 0
write(1, "Hello, world!\n", 14Hello, world!
)         = 14
exit(0)                                 = ?
+++ exited with 0 +++

The trace is similar to the previous example, showing the write() and exit() system calls.

By creating system call wrappers as C functions, we are now relieved of the burden of writing assembly language code. Of course, there has to be an initial implementation of wrapper functions written in assembly language; but, after that, we can use C only.

Library calls vs System Calls

The standard C library has primarily two uses:

wrapping system calls into easier to use C-style library calls, such as open(), write(), read()
adding common functionality required for our program, such as string management (strcpy), memory management (malloc()) or formatted I/O (printf())

The first use means a 1-to-1 mapping between library calls and system calls: one library call means one system call. The second group doesn't have a standard mapping. A library call could be mapped to no system calls, one system call, two or more system calls, or it may depend (a system call may or may not happen).

The libcall-syscall/support folder stores the implementation of a simple program that makes different library calls. Let's build the program and then trace the library calls (with ltrace) and the system calls (with strace):

student@os:~/.../libcall-syscall/support$ make
cc -Wall   -c -o call.o call.c
cc   call.o   -o call
cc -Wall   -c -o call2.o call2.c
cc   call2.o   -o call2

student@os:~/.../libcall-syscall/support$ ltrace ./call
fopen("a.txt", "wt")                                                                                             = 0x556d57679260
strlen("Hello, world!\n")                                                                                        = 14
fwrite("Hello, world!\n", 1, 14, 0x556d57679260)                                                                 = 14
strlen("Bye, world!\n")                                                                                          = 12
fwrite("Bye, world!\n", 1, 12, 0x556d57679260)                                                                   = 12
fflush(0x556d57679260)                                                                                           = 0
+++ exited (status 0) +++

student@os:~/.../libcall-syscall/support$ strace ./call
[...]
openat(AT_FDCWD, "a.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
write(3, "Hello, world!\nBye, world!\n", 26) = 26
exit_group(0)                           = ?
+++ exited with 0 +++

We have the following mappings:

The fopen() library call invokes the openat and the fstat system calls.
The fwrite() library call invokes no system calls.
The strlen() library call invokes no system calls.
The fflush() library call invokes the write system call.

This all seems to make sense. The main reason for fwrite() not making any system calls is the use of a standard C library buffer. Calls the fwrite() end up writing to that buffer to reduce the number of system calls. Actual system calls are made either when the standard C library buffer is full or when an fflush() library call is made.

Note that on some systems, ltrace does not work*- as expected, due to now binding. To avoid this behaviour, you can force the lazy binding- (based on which ltrace is constructed to work). An example can be found in libcall-syscall/support/Makefile, however for system binaries, such as ls or pwd, the only alternative is to add the `-x ""` argument to force the command to trace all symbols in the symbol table:

student@os:~$ ltrace -x "*" ls

You can always choose what library functions ltrace is investigating, by replacing the wildcard with their name:

student@os:~$ ltrace -x "malloc" -x "free" ls
malloc@libc.so.6(5)                                                    = 0x55c42b2b8910
free@libc.so.6(0x55c42b2b8910)                                         = <void>
malloc@libc.so.6(120)                                                  = 0x55c42b2b8480
malloc@libc.so.6(12)                                                   = 0x55c42b2b8910
malloc@libc.so.6(776)                                                  = 0x55c42b2b8930
malloc@libc.so.6(112)                                                  = 0x55c42b2b8c40
malloc@libc.so.6(1336)                                                 = 0x55c42b2b8cc0
malloc@libc.so.6(216)                                                  = 0x55c42b2b9200
malloc@libc.so.6(432)                                                  = 0x55c42b2b92e0
malloc@libc.so.6(104)                                                  = 0x55c42b2b94a0
malloc@libc.so.6(88)                                                   = 0x55c42b2b9510
malloc@libc.so.6(120)                                                  = 0x55c42b2b9570
[...]

If you would like to know more about lazy binding, now binding*- or PLT*- entries, check out this blog post.

Lab 1 - Operating System Perspective

Task: System Calls​

Task: System Call Wrappers​

Task: Library Calls vs System Calls​

Modern Software Stacks​

Analyzing the Software Stack​

System Calls Explained​

Basic System Calls​

System Call Wrappers​

Library calls vs System Calls​

Task: System Calls

Task: System Call Wrappers

Task: Library Calls vs System Calls

Modern Software Stacks

Analyzing the Software Stack

System Calls Explained

Basic System Calls

System Call Wrappers

Library calls vs System Calls